Journal: Proceedings. IEEE International Conference on Computer Vision
Article Title: Scaling Recurrent Models via Orthogonal Approximations in Tensor Trains
doi: 10.1109/iccv.2019.01067
Figure Lengend Snippet: (left) Mean squared error for different TT-ranks, using both the Riemannian formulation (3) and the approximate Stiefel formulation (4). (center) Effect of TT-rank on per iteration runtime of both methods. OTT is significantly faster (10x) than the Riemannian formulation. (right) Memory Dependence of both TT and OTT constructions as a function of rank. The OTT formulation allows for models roughly double the size of TT.
Article Snippet: We use a Riemannian gradient descent technique on this product of Stiefel manifolds P S . Given { Q i t ( x i ) } as the solution of the t th step, the ( t + 1) th solution, { Q i t + 1 ( x i ) } , can be computed using { Q i t + 1 ( x i ) } = Exp ( { Q i t ( x i ) } , ∂ E ∂ { Q j t ( x j ) } ) , (9) where Exp is the Riemannian Exponential map on P S . On P S , computation of Riemannian Exponential map is not tractable and needs an optimization, hence we use a Riemannian retraction map as proposed in [ 14 ]. summarizes this procedure.
Techniques: Formulation